DOCUMENT RESUHE 



ED 310 130 TM 013 703 

AUTHOR Pardo, Elly? Russell, Joanne Collins 

TITLE Standardized Tests at the Early Childhood Level: Do 

They Tell Us the Truth about Children? 
PUB DATE Mar 89 

NOTE 45p.f Paper presented at the Annual Meeting of the 

American Educational Research Association (San 

Francisco, CA, March 27-31, 1989). 
PUB TYPE Reports - Research/Technical (143) — 

Speeches/Conference Papers (150) — Tests/Evaluation 

Instruments (160) 

EDRS PRICE MF01/PC02 Plus Postage. 

DESCRIPTORS Academic Achievement? «Achievement Tests? Children; 

•Early Childhood Education? «Elementary School 
Students; Follovup Studies; Grade 1; Grade 2; 
Grouping (Instructional Purposes); High Risk 
Students; Remedial Reading; ^Standardized Tests; 
•Test Validity; «Writing Evaluation 

IDENTIFIERS ^Metropolitan Achievement Tests 

ABSTRACT 

A small follow-up study of three male and five female 
children, who were at risk of school failure and were enrolled in a 
model developmental Early Learning Center in the Boston 
(Massachusetts) Public Schools, is described. The focus was on 
determining whether a standardized achievement measure used to 
evaluate student academic success/progress accurately reflected tne 
level of mastery attained by the students. Despite their satisfactory 
performance in reading/language arts and mathematics, most of the 
students scored below the 40th percentile when tested at the end of 
tne school year on the MetropolitcUi Achievement Test (MAT6) in 
reading and mathematics. The students' second-grade teachers were 
interviewed to determine their assessment of the students' 
performanr3 vocabularVr word-recognition, reading comprehension, and 
writing skill areas. The students' journals were rated by independent 
judges, and the first-grade achievement of follow-up students on the 
school department's reading/language arts and writing 
curriculum-referenced tests were compared with their first-grade 
achievement on the MATS reading test. The MATS did not articulate 
well with other measures of the students' first-grade 
reading/language arts skills or their second grade reading/language 
arts school performance. The use of a single test score to classify 
children for special educational services, particularly in the early 
childhood years, is questioned. Discrepancies have arisen between the 
format for classroom learning at the early childhood level and the 
format for measuring learning. Five data tables are included; and 
interrater reliability coefficients, the student follow-up survey, 
transcription rules, samples of transcribed texts, and the writing 
sample evaluation instrument are appended. (TJH) 

****************** ********m*it It It itunnnnnnnnnnnnn* nit It It It It It mm mmm It It nit It nit nit 

* Reproductions supplied by EDRS are the best that can be made 

* from the original document. 



Ut MFAirrWENT Of EDUCATION 
C>tt<e o» Edoc«tK)oai Rvaearch and impfowament 
EDUCATIONAL RESOURCES INFORMATION 
. CENTER (ERIC) 

Sn'hia document has b«en reproduced « 
received from the peraon or organization 
OTiginatir^ it 

C Minor changes have been made to improve 
reproduction quality 



e Points ol view or optnions slated m this docu 
men! do not necessarily represent oMiCiai 

OERl position or policy 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)" 



Standardized Tests at the Early Childhood Level: 
Do They Tell Us the Truth about Children?! 



Eky Pardo and Joanne G>IIins Russell 
Boston Public Schools 



Paper prepared for delivery at the annual meeting of the American Educational Research 
Association, San Francisco, CA, March 27-31, 



^ Viewpoints expressed in this documeni are solely tho^re of the authors. Reprint requests 
can be directed to: Dr. Elly Pardo, Department of Program Evaluation, Boskcn Public 
Schools, 26 Court St, Boston, MA 02108, (617) 726-6200 xS793, or Joanne Collins Russell, 
Early Learning Center, 50 Bcechcrofl Street, Brighton, MA 02135, (617) 254-6672. 



2 BEST COPY AVAILABLE 



HIGHUGHTS 



This paper describes a small, follov*up study of eight children 
enrolled in a modal, developmental Early Learning Center (ELC) in 
the Boston Public Schools (BPS) . During the 1987 - 1988 school 
year, the children in question were assigned to a first-grade 
classroom in which learnir>g occurred through direct, hands-on 
experiences. Despite their average or above average classroom 
performance in reading/language arts and math, the majority of 
youngsters scored below the 40th percentile when tested at the 
end of the school year on the Metropolitan Achievement Test 
(MAT6) in Reading and Math. Given that the 40th percentile is 
the point used by the Boston Public Schools to identify students 
who are potentially at risk of academic failure, it was decided to 
investigate whether the MAT6 had accurately reflected the level 
of academic attainment and learning potential of the students 
under study. 

The analyses in this docvunent address only the area of reading/ 
language arts. They point to the mismatch that has occurred 
between the assessment of student skills by a standardized 
achievement test and the assessment of those same skills by other 
indicators of student academic performance. 



THEORETICAL BACKGROUND : CHILD LEARNING THEORY 

m Children learn best when they acquire information at their own 
pace through direct, hands-on experiences. 

• Children construct their notions of the world by physically 
interacting with it and not by passively observing or reacting 
to things and events in their environment. 

« Standardized achievement tests evaluate skills through an 
abstract, decon textual ized format that is based entirely upon 
symbolic reasoning. This format is incompatible with the way 
young children naturally process information. 

• Even though students may be 'test-ready,' that is, they may 
have mastered good test-taking skills, they may not have devel- 
oped the theoretical constructi^ necessary for them tc see 
relationships among ideas or to a^^^ly information to a variety 
of learning environments. 

• When imitation is taught at the expense of experiential learn- 
ing, complex reasoning and problem-solving skills that are 
critical for higher-order thinking are bypassed. 



ERLC 



3 



METHODS 



• In a structured telephone interview conducted in February of 
1989, second-grade teachers of follow-up pupils were asked to 
rate the classroom performance of these pupils in skills areas 
pertaining to vocabulary, word-recognition, reading comprehen- 
sion, and writing. 

e Journals produced by ELC follow-up students in April of 1988 
were independently rated by two first-grade teachers from 
schools located in the same geographical area. The teachers 
were using instructional techniques in reading/language arts 
that articulated well with the format of the Metropolitan 
Achievement Reading Test — that is, they were teaching pri- 
marily through a basal reading series, workbooks, and topic- 
specific worksheets. 

• The first-grade achievement of follow-up students on the school 
department's Reading/ Language Arts and Writing Curriculum- 
Referenced Tests was compared with their first-grade achieve- 
ment on the MAT6 Reading test. Teacher judgements of pupil 
performance in reading, language arts, and writing were also 
compared for both first and second grades. 



FINDINGS 

e Outcomes on the Metropolitan Achievement Test in Reading for 
follow-up pupils show that the instrument did not articulate 
well with other measures of their first-grade reading/ language 
arts skills level or their second grade reading/ language arts 
school performance. 

e The school department's Grade 1 Curriculum-Referenced Test in 
Reading/ Language Arts provided a more realistic picture of the 
second-grade reading/ langu&ge arts performance of follow-up 
students than did the MAT6 in Reading. This is largely because 
test items on the Reading/Language Arts CRT not only match the 
content of the curriculum for the BPS, but they also assess 
student knowledge appropriate for a specific grade level* Items 
or the Metropolitan Achievement Tests, however, are based upon 
information in curriculum guides used nationally, and the 
material on these tests spans several grade levels. 

e Second-grade teacher ratings of student reading, language arts, 
and writing school performance revealed that all the youngsters 
in question were on grade level in these skills areas. In 
fact, even though they qualified for and were receiving Chap>:er 
1 (federally-funded) remediation services, some of the young- 
sters were judged by their teachers to be at the top of their 
class in literacy skills. 



4 



• First-grad^ teacher ratings of student journals were highly 
favorable and also bore some resemblance to the ratings of stu* 
dent writing ability by second-grade teachers. 

e First-grade CRT Writing scores for observed students gen- 
erally were consistent with teacher evaluations of their 
writing ability in both the first and second grade. 

IMPUCATIONS OF FINDINGS 

m The data in this study suggest that an assessment tool that 
measures skills taught beyond a child's actual grade level may 
be categorizing more students as being at educational risk than 
is actually the case. 

e The data also raise questions about the use of a single test 
score to label or classify children for special educational 
services, particularly in the early childhood years when 
the growth patterns of youngsters are uneven. 

e Finally, the study draws attention to potential problems that 
have arisen between the format for classroom learning at the 
early childhood level and the format for measuring learning 
achievement. More specifically, the contents of the document 
set forth a dilemma between current theory on the most educa- 
tionally-sound practices in early childhood education and the 
use of a standardized achievement test to successfully capture 
the impact of those practices. 



ERLC 



5 



INTRODUCTION 



In this paper ve describe a small, pilot study investigating 
whether a standardized achievement test, which measures discrete 
items of knowledge for specific content areas, can successfully 
capture the abilities of young children acquiring skills through 
an integrated , discovery-oriented approach to learning • We 
suggest that because standardized measures retrieve information 
in ways that do not parallel the developmental learning processes 
of young children, they may be better indicators of good test- 
taking skills than of actual knowledge mastery. Evidence for our 
contentions comes from psychological learning theory (Piaget, 
1965, 1970), observational data on the classroom of eight, 
first-grade students enrolled in a developmental education Early 
Learning Center (ELC) , Metropolitan Achievement Test (MAT6) 
scores in reading for the eight observed students, analyses of 
student products, and follow-up data on student classroom perfor- 
mance. 

Children attending the Early Learning Center in question were 
participating in a model developmental education program for both 
the Boston Public Schools and the State of Massachusetts. Con- 
sequently, a number of school -department administrators consid- 
ered it important to conduct follow-up research on ELC first 
graders who had performed poorly on the MAT6 Reading test but who 
had demonstrated school academic skills that enabled them to be 
promoted to the second grade. 

For the Boston Public Schools, poor performance on the Metropoli- 
tan Achievement Test is indicated by a score that falls at or 
below the 40th percentile. This percentile rank is highly impor- 
tant within the school department for three reasons: First, it 
is the point at which students are identified as being at educa- 
tional risk — that is, the/ are labeled as potentially in danger 
of failing school; second, it is the point which determines pupil 
eligibility for Chapter 1 remediation services;^ and third, it 
is the point which flags a school, its instructional programs, 
its teachers, and its administrators as not having met the school 
system's academic standards. 

The overarching purpose of a study that contrasts test perfor- 
mance with school achievement is to determine the level of com- 
patibility between a standardized achievement measure used to 
evaluate student academic success and other indicators of student 
academic progress. The study described here not only examines 
this question but also considers a related concern, which is the 
extent to which a standardized test score can reliably account 
for a pupil's actual and potential academic attainment. 



^ Chapter 1 is a federal entitlement program for low achieving 
students. 



The discussion that follows provides descriptive information on 
the first-grade learning environment of follow-up pupils as well 
as comparative data on student test scores and teacher evalua- 
tions of pupil academic performance. Analyses are presented 
within both a qualitative and quantitative framework and address 
the mismatch that has occurred between student achievement and 
the method used to assess instructional progress. 



THE INSTRUCTIONAL CONTEJCT 

During the 1987-1988 school year, the students in question 
attended a first-grade classroom in a full-day developmental 
education program having before- and after-school daycare ser- 
vices. The philosophical assumption tinderlying instruction in a 
developmental model is that children learn best when they acquire 
information at their own pace through direct, hands-on exper- 
iences (NAEYC, 1988). These experiences, often individualized, 
are thought to allow preschoolers sufficient room for explora- 
tion, creativity, and the formation of good critical-thinking and 
reasoning skills that are appropriate to their developmental 
(cognitive) age. 

Intrinsic to a developmental model is an instructional process 
through which child and teacher work together to plan and carry 
out the day's activities. Because it is widely accepted that 
young children acquire information holistically and not as iso- 
lated and discrete units (Piaget and Inhelder, 1969) , the devel- 
opmental process is based upon an integrated mvMel of learning. 
This model presents activities as ongoing and interrelated (Hol- 
daway, 1985) . Themes thus become extremely important vehicles 
for linking various skills areas. In this way, the theme of 
"Beginnings** might include tasks such as planting a garden or 
squeezing lemons to make lemonade. These tasks, in turn, might 
teach a range of skills such as oral language development, count- 
ing and measuring, fine-motor coordination, and social cooper- 
ation. 

Instruction in the first-grade clasnroom under study adhered 
closely to the developmental framework described above. Never- 
theless, what distinguished learning in this classroom from many 
other first-grade classrooms was the noticeable lack of workbook 
and worksheet activities. Paper and pencil tasks served pri- 
marily for children to record their own discoveries and to com- 
ment on experiences by means of an inventive spelling technique.^ 



^ The philosophical assimption underlying the theory of inventive 
spelling is that children learn to write best by developing their 
own guidelines of spelling and grammar. In time, as these 
children learn more about the adult rules of the language, their 
child rules will be replaced by adult forms. 



ERLC 



7 



Journal %rrit .ng was therefore an extremely Important component of 
the curriculum; it occurred daily and encouraged children to 
synthesize information, to reflect critically on their exper* 
iences from multiple points of view, to relate classroom activi* 
ties to one another, and to organize their ideas into a coherent 
text. In the Discussion of Data ve elaborate on the relationship of 
the reading -writing connection to student performance outcomes. 



THE THEORETICAL CONTEXT 

Current research on child learning has confirmed what researchers 
such as Jean Piaget (1954, 1965) had proposed much earlier - that 
children construct their notions of the world by physically 
interacting with it and not by passively observing or reacting to 
things and events in their environment, in fact, even around age 
six, when youngsters acquire the capacity to think symbolically 
(they ca use words and numbers to represent objects and rela* 
tions) , their thoughts are still tied to concrete reference 
points. As a result, even though a child of six or seven does 
not have to touch or move things in order to hold them in 
thought, his or her representation of objects and events and the 
relations among them still must derive from his or her own exper* 
iances with things and situations in the world, (cf. Brown, 
Collins, Duguid, 1989.) 

We argue here that because cognitive, growth in young children 
occurs extremely rapidly and is subject to considerable variabil- 
ity within generally-recognized sequences of development, it is 
often difficult to obtain reliable and valid results from stan- 
dardized achievement tests (cf. Cohen, 1988; Meisels, 1984). 
Particularly when a test is structured to tap skills mastery 
through an abstract, decontextualized format, a mismatch occurs 
between the way young children naturally internalize information 
and the unnatural method chosen for them to communicate what they 
know. 

If learning takes place primarily through workbooks and work- 
sheets, students are acquiring information by rote. They may be 
finding out about right and wrong answers but they are not dis- 
covering how to conceptualize and analyze problems. m fact, 
even though students may be 'test-ready' — that is, they tiay have 
mastered good test-taking skills, they may not have develcped the 
theoretical constructs necessary for them to see relationships 
among ideas or to apply information to a variety of learning 
contexts. No one can deny that youngsters are excellent imita- 
tors and that imitation plays an important role in their overall 
physical and intellectual growth. Nevertheless, when imitation 
is taught at the expense of experiential learning, complex rea- 
soning and problem-solving skills that are critical for higher- 
order thinking are bypassed. 



Given the above arguments, many researchers (Madaus, 1983, 1988; 
Melsels, S.J., 1986) and child advocacy groups (National Center 
for Fair and Open Testing, 1987; National Association for the 
Education of Young Children, 1988) have advanced the contention 
that tests administered on a one-time basis cannot independently 
account for student progress or the overall success of a given 
educational program. They have proposed that pupil outcomes be 
judged in light of a range of variables that address achievement 
from both a qualitative and quantitative perspective. Generally, 
researchers concerned with the appropriate assessment of young 
children have proposed that interpretations of test outcomes be 
made in light of three factors: the degree of fit between a test 
and the instructional program it measures; the cultural and 
language preferences of students; and the intended use of the 
test — that is, how results will impact a student's educational 
program. 



METHODS 

SUBJECTS 

Three males and five females participated in the follow-up study 
of children identified as being at risk of school failure. The 
children reflected the cultural and socioeconomic diversity of 
the Boston Public Schools: four were Black, two were Hispanic, 
one was Asian, and one was White; five qualified for a free 
school lunch. At the time they were assessed on the Metropolitan 
Achievement Test (aAT6) , the youngsters ranged in age from 75 to 
85 months; four were below the age of seven and four were seven 
or slightly older. All subjects were judged by their teacher to 
be dominant speakers of English even though two heard a language 
other than English in the home. 

As previously mentioned, subject selection was based on a child 
having tested poorly on the MAT6 (at or below the 40th percen- 
tile) . The percentile ranks for the children in question fell 
between the 5th and 37th rank. 



SECOND-GRADE TEACHERS OF STUDY STUDENTS 

Follow-up data on the reading/language arts school performance of 
observed students were obtained from their second-grade teachers. 
Overall, these individuals had an average of 23 years of instruc- 
tional experience, with 13 of these years in a second-grade 
classroom. Four of the teachers said they primarily used tradi- 
tional instructional techniques; three said that they used a 
balanced blend of traditional and developmental instructional 
techniques; and one indicated using more developmental than 
traditional teaching techniques. 



In February of 1989, structured telephone interviews were con- 
ducted with each of the second-grade teachers of follow-up stu- 
dents. The purpose of these interviews was to determine the level 
of success that the students were experiencing in their reading/ 
language arts school program. 



RATERS OF STUDENT WRITING SAMPLES 

Because follow-up students had moved into more traditional learn- 
ing environments than the one in which they had participated in 
first grade, we believed it would be informative to choose teach- 
ers for writing sample evaluations who were using in3tructional 
methods that articulated well with the format of the Metropolitan 
Achievement Test. In other words, we were looking for individu- 
als who based a large part of their classroom instruction on 
workbook and worksheet learning. Two additional criteria for the 
selection of teachers were that: {1) they be assigned to schools 
having students with similar racial and socioeconomic character- 
istics as those of Early Learning Center students, and (2) ove- 
rall, the 1987 - 1988 pupils of these teachers had attained 
average or better than average scores on the MAT6 Reading Test. 
The intent of this rater-selection process was to give us an 
indication as to whether the first-grade academic standards set 
for follow-up pupils were on a par with standards set by teachers 
of students learning successfully through more traditional meth- 
ods. 

Two teachers from different elementary schools were selected to 
conduct an evaluation of the writing samples of observed stu- 
dents. These individuals had an average of 24 years teaching 
experience, with 15 of these years in a first-grade classroom. 
The teachers were trained to rate transcripts of student jour- 
nals independently. On four practice transcripts, their judge- 
ments for all items on the writing-assessment tool coincided 78% 
of the time. On all but one of the individual items, their 
judgements coincided either 75% or 100% of the time. (See Appen- 
dix A for inter-rater reliability coefficients.) 



CLASSROOM OBSERVATION DATA 

To determine how instruction was defined and administered, 
trained observers collected 12 hours of classroom data during 
April and May of 1988. All observations were conducted between 
9:00 A.M. and 12:00 P.M. Data from visitations were quantified, 
and results indicated that 53% of the instructional activities 
focused on language development and literacy; that these activi- 
ties occurred in a balanced mix of whole class, small group 
(fewer than 8 students), and individualized learning environ- 
ments; and that nearly 40% of the activities were initiated by 



ERJC 



10 -5- 



the children themselves. Quantitative findings thus show that 
the students in question had considerable exposure to literacy 
activities during the school day and that a good number of these 
activities were defined by the children themselves. 



DATA-COLLECTION INSTRUMENTS 

Several indicators of student academic achievement were used to 
assess skills mastery. A general description of these indicators 
appears below with more specific descriptions appearing in the 
discussion of outcome data. 

Metropolitan Achievement Tests (MAT6) 

The Metropolitan Achievement Tests are designed to measure stu- 
dent academic achievement in major content areas of the school 
curriculum. The tests evaluate sJcillr appearing in leading text- 
book series, state curriculum guidelines, and school -system sys- 
tem syllabuses. Scores and corresponding percentile ranks pro- 
vide general information about a student's performance within a 
given discipline relative to the performance of other students at 
the same grade level. Comparison nonns are established at the 
national level. 

Pupils in the Boston Public Schools received the Reading and Math 
Metropolitan Tests in early May of 1988. Outcomes for only the 
Reading test are considered in this study. 

Curriculum Referenced Tests (CRT's) 

The Boston Public Schools' Curriculum-Referenced Tests (CRT's) 
are in-hcuse, school -department instruments developed to articu- 
late with the system's curriculum objectives. Each item on a CRT 
corresponds to an objective, such as 'listening skills,' and 
clusters of test items correspond to a curriculum strand such as 
'phonics' or 'word recognition.' The CRT's have been highly 
useful to the Boston Public Schools for several reasons: First, 
they afford teachers an important means for monitoring the aca- 
demic progress of students against a school -department criterion; 
second, they provide a diagnostic tool for assessino student 
achievement that complements the school department. 's norm- 
referenced (MAT6) testing program; and third, by systematically 
capturing the same kinds of data for all grade levels, the CRT's 
allow for student comparisons to be made across schools within 
the same district and at a systemwide level. ^ 

Curriculum Referenced Tests were administered to the entire stu- 
dent body in late June of 1988. First graders received CRT's in 

3 The Boston Public Schools contains five school districts. 



ERIC 



Reading/Language Arts, writing, Math, and Science. For the 
purpose of this analysis, outcome data for the Reading/Language 
Arts and Writing CRT's will be considered. 

Student Follow-up Survey 

Using a telephone interview questionnaire, teachers were asked 
to rate specific aspects of a follow-up pupil's grammar, vocabu- 
lary, comprehension, and writing skills as compared to those of 
other students in the class. Teachers were also asked to evalu- 
ate a follow-up student's progress in his or her basal reading 
program.* Additionally, they were requested to indicate their 
instructional style from a list of styles ranging from highly 
developmental to highly traditional. (See Appendix B for a copy 
of the Student Follow-up Survey and an explanation of the rating 
categories comprising the instrument. 

Student Writing Sample Evaluation 

Earlier we explained that journal writing was an activity of 
primary importance in the classroom of the students under 
observation. This activity is one which best exemplifies a whole 
language approach to the teaching of literacy. Whole- language 
teaching integrates reading, writing, and speaking into a single 
cognitive framework, its purpose being to promote holistic learn- 
ing by interrelating different ways of encoding meaningful rela- 
tionships (cf. Holdaway, 1979). E^cause reading and writing are 
often treated as alternative means of expressing the same cogni- 
tive processes (Kucer, 1987; wittrock, 1983), we considered 
writing to be a important indicator of a pupil's reading/language 
arts competence. 

Consequently, in January of 1989 , the first-grade teacher of 
follow-up students transcribed the texts from journals the young- 
sters had produced during April of 1988 — the month prior to 
their being assessed in reading on the Metropolitan Achievement 
test. The material in the journals covered both assigned and 
free-choice writing. Assigned topics related to classroom pro- 
jects such as the hatching of chicks, science experiments, let- 
ter-writing, and imaginary thinking. Free-choice topics gener- 
ally described experiences the children had had outside of school 
or within their classroom. Each journal had approximately fif- 
teen entries. 



* As of the 1988 - 1989 school year, all Boston Public School's 
students in grades K through 8 are required to receive reading/ 
language arts instruction from a single basal series. Compari- 
sons of their basal reading progress are thus based upon the same 
instructional program. 



writing aaaqples were transcribed exactly as they had been entered 
into a given journal, the context being provided for each indi- 
vidual entry. To facilitate the evaluation of written material 
by independent raters, specific transcription symbols flagged 
general gramvatical or stylistic features of a pupil's writing 
that would be considered non-conventional within a school con- 
text. (Se« Appendix C for a sumr ~y of transcription conven- 
tions, and Appendix D for samples c transcribed texts.) 

Instructions for the evaluation of student transcripts were as 
follows: 

The rating ccUeames below refer to aspects of a 
first-grade student's writing skill that aetermme 
his her ability to cmnpose clear and affective 
expository prose. You ha;ve seen that specific 
transcriptimt symbols we'e used to flag non- 
a>rweraional features of a student's writing. When 
ratir^ each transcript, these 'non-conventional- 
isnvr should not be considered as errors h rather 
as deviations from an adidtstandaid. Th^ adult 
standard used here is one that is appropriate to 
school settings and does not refer to standards 
appropriate to other dialects M English. As you 
Know, mam of the language deviations you have read 
ir» the child texts are normal for a first-grade^ 
ther^ore, from a developmental or mamrational 
point of view, they cannot be treated as 'mis- 
takes.' Bairing this point in mind, please rate 
each transcript according to what ynu regard as 
'extremely high, ' 'average, ' 'below average, ' and 
'extremely low' writing performance for a typical 
first grader. 

The four rating classifications mentioned above corresponded to 
the letters 'A,' 'B, ' 'C, ' and 'D, ' respectively, and cci^u be 
used with plv^ses and minuses to modify the value of any given 
classification. Each child's writing sample was judged according 
to the following criteria: (1) overall writing ability? (2) 
spelling and punctuation; (3) grammatical structure; (4) grammat- 
ical usage; (5) grammatical complexity; (6) descriptive modifica- 
tion; (7) thematic cohesion, and (8) idea development. (See 
Appendix E for a copy of the rating instrument. 



ANALYSES OF DATA 

Ratings from the teacher follow-up survey and student writing 
sanples were entered intr computer files. Pupil scores on the 
Metropolitan Achievement Test in Reading, and Reading/Language 
Arts and Writing scores for the Curriculum Referenced Tests were 



also •nt^red Into computer files. The files were then analyzed 
using the statistical package SPSS PC+. Analyses focused on the 
total-group performance of subjects, student performance by age 
(older than seven or younger than seven) and individual student 
performance. Findings shoved no meaningful differences for the 
'age' variable and consequently, will not be discussed in this 
paper. Summary data were also scrutinized to locate performance 
differences by race and by socioeconomic status. These differ- 
ences also were not found. Finally, the data were examined to 
determine whether second-grade teacher j^atings of pupils' class- 
room performance were associated at all with a teacher's instruc- 
tional style. The data shoved no evidmce of this association. 



DISCUSSION OF DATA 

STUDENT FOLWW'UP SURVEY 

Teacher responses to telephone questionnaire items are summarized 
for the total student group in Table 1. Categories of information 
parallel those used to assess children on the reading survey of 
the Metropolitan Achievement Test. For the general areas of 
grammar and composition, teachers rated a child on his or her: 
vocabulary skills 9 knowledge of word meanings and recognition of words 
in context; phonics skUls , knowledge of beginning, medial, and final 
sounds; sentence-completion skills , an ability to choose the correct 
word, from a list of words, that completes the meaning for a se- 
quence of text; literal comprehension skills , an ability to identify the 
main idea and details of a story; and evaluative comprehension skills , an 
ability to predict story outcomes and logical conclusions, and to 
recognize the components of a story such as the 'saddest,' 'hap- 
piest,' or 'scariest' part. Teachers judged student writing 
skills on the basis of two criteria: oven ill grammaticality, that is, 
a child's use of age-appropriate features for spelling and punc- 
tuation, grammatical structure, grammatical usage, and grammati- 
cal complexity; and overall descriptive writing ability , a child's literary 
skill in using dascriptive modification, thematic cohesion, and 
idea development. (The reader is again encouraged to refer to 
Appendix E for a more elaborate description of the above evalua- 
tion criteria.) 



-9- 



TABLE 1 

STUDENT FOLLOW-UP SURVEY 
AVERAGE SKILUAREA RATINGS 

RATING SCALE: 8(A), 7(A), 6(B + ), 5(B), 4(B.), 3(C + ), 2(C), KO 
SKILL AREA GROUP MEAN SCORE RANGE 



VOCABULARY SKILLS 6 (B+) 4 TO 8 

(B-) (A) 

PHONICS SKILLS 6 (B+) 4 TO 8 

(B-) (A) 

SENTENCE COMPLETION SKILU 5 (B) 3 TO 8 

(C+) (A) 

LITERAL CO\f PREHENSION 6 (B+) 3 TO 7 

(C+) (A-) 

EVALUATION CO\f PREHENSION 6 (B+) 3 TO 8 

(C+) (A) 

GRAMMATICAL WRITING ABILITY 5 (B) 4 TO 8 

(B-) (A) 

DESCRIPTIVE WRITING ABILITY 5 (B) ^ TO 8 

(C) (A) 



Results of t«ach«r ratings prsssntsd In Table 1 show that on the 
average/ students were considered to be performing at a 'B' or 
'B-f' level in all of the specified categories. Although scores 
for individual items ranged from to 'k', only one subject per 
category was assigned a rating lower than 'B-'. 

Also comments from teachers regarding the academic performance of 
the students in question were all favorable. One individual who 
characterised her instructional style as a balanced bl^nd of both 
developm«ntal and traditional features remarked/ "I really notice 
a difference between him and the other children. He has very 
good work habits and completes work on time. This may be a 
result of what they did with him last year. He really enjoys 
reading/ and I know he gets support at home and that helps. 

^ Parent involvement in their child's literacy process is an 
intrinsic component of the developmental education model at the 
Early Learning Center in question. 



-10- 



erJc 



Another teacher with similar self-defined instructional charac* 
teristice praised a follow-up pupil's strong writing ability, 
explaining that ••Whatever they were doing (at the ELC) in the 
area of writing, it was successful . •• Regarding three of the 
students in her classroom, another far more traditional teaoher 
remarked, ••! was surprised when I saw their 'MET' scores compared 
to the performance they are giving me. That MET test is not an 
indication of what a child can do!^^ nevertheless, in tens of 
conventional school behaviors, the same teacher stated, ••The kids 
are sooo different froji any other second grader I have ever had 
before. They are beautiful readers, but they don't know how to 
sit in chairs, or they sit with their feet on chairs, or they 
like to sit on the floor. The three are great students. . .but 
they're into 'do your own thing.' I can't have that.^^ 

Table 2 presents a breakdown for the ratings of follow-up stu- 
dents' grammar and composition ability presented in Table 1. 
Again it should be pointed out here that the content of student- 
assessment categories was guided by the content of subtest areas 
evaluated on the Metropolitan Achievement Test. Data comparing 
teacher ratings of student abilities with student subtest percen- 
tile ranks for the reading survey of the (MAT6) show marked 
discrepancies between the assessment of a child's skills level by 
his or her teacher and a test's assessment of his or her skills 
level. For example, vocabidarly ratings assigned by teachers are 
composite scores for items measuring a student's knowledge of 
word meanings, an ability to recognize words in context, and an 
ability to identify the correct word that completes the meaning 
for a sequence of text. If these ratings are contrasted with the 
percentile ranks obtained by follow-up students on the MAT6 
vocabularly siibtest, which also assesses the previously-mentioned 
skills, we notice that the information in column 1 does not 
correspond to the information in column 2. 

In fact, with the exception of one youngster who generally scored 
slightly lower than the others,^ all follow-up students were 
considered by their teachers to have attained a solid average or 
above average academic standing in grammar (vocabulary and word- 
recognition skills) and comprehension. The discrepancy between 
teacher appraisals of student classroom performance and the 
performance of students on the MAT6 is seen even more clearly if 
mean teacher ratings for specific skills areas are compared with 
median percentile ranks for comparable subtest areas on the MAT6. 



^ The teacher of this child explained that her ratings were 
influenced by the child's lack of motivation and not by the 
child's lack of training. 



-11- 

l[6 



TABLE 2 

SCORE COMPARISONS 
SECONI^GRADE TEACHER RATINGS WITH MAT6 SUBTEST SCORES 

RATING SCALE: 8(A), 7(A), 6(B + ), 5(B), 4(B-). 3(C + ). 2(C). l(C-) 



10 VOCAB TCn^ VOCAB MATC WOSO UC^ WORD MC COUP TCHX^ COMP MAT6 

MAT %ZLI TCn NATC MAT %ILE 

MAT %ZLI 



1 


8 


(A) 


41 


8 


(A) 


37 


8 


(A) 


39 


2 


7 


(A-) 


22 


8 


(A) 


47 


8 


(A) 


46 


3 


7 


(A-) 


31 


7 


(A-) 


37 


8 


(A) 


28 


4 


4 


(B-) 


18 


4 


(B-) 


42 


3 


(C+) 


19 


5 


6 


(B+) 


18 


4 


(B-) 


32 


5 


(B) 


31 


6 


5 


(B) 


14 


5 


(B) 


32 


5 


(B) 


28 


7 


6 


(B+) 


36 


8 


(A) 


47 


5 


(B) 


31 


8 


5 


(B) 


10 


6 


(B+) 


23 


6 


(B+) 


3 




MEAN 
RATING 


20 

MEDIAN 
%ILE 


5(B) 

MEAN 

RATING 


37 

MEDIAN 
%ILE 


6(B+) 
MEAN 
RATING 


30 

MEDL 
%ILE 



Note: Ratings of student performance were obtained from the second grade teacher of each 

target child. The content of items evaluated by teachers parallels the content of items for 
the q>propriate subtest on the MAT6. 

^VOCABULARY KATZMOf ARE COMPOSITE SCORES FOR ITEMS MEASURING A 

STUDENT'S KNOWLEDGE OF WORD MEANINGS, ABILITY TO RECOGNIZE WORDS 

IN CONTEXT, AND ABILITY TO IDENTIFY THE CORRECT WORD, FROM A LIST OF 

WORDS THAT COMPLETES THE MEANING FOR A SEQUENCE OF TEXT. 

'word RIOOanTZOV RATZMai MEASURE A STUDENT'S ABILITY TO IDENTIFY 
BEOimilNG MEDIAL, AND FINAL SOUNDS. 

^COMPRBHIMBZOH RATZMOS ARE COMPOSITE SCORES FOR ITEMS MEASURING A 
STUDENT'S ABILITY TO IDENTIFY THE MAIN IDEA AND DETAILS OF A 
STORY, PREDICT STORY OUTCOMES AND LOGICAL CONCLUSIONS, AND 
IDENTIFY COMPONENTS OF A STORY SUCH AS THE 'SADDEST,' OR 
'SCARIEST' PART. 



Data In Table 3 contrast teacher appraisals of the skills level of 
follow-up students In second grade with test scores measuring 
their skills attainment at the end of first grade. A composite 
teacher rating for a student's overall grammar and composition 
ability was compared with the student's score on the school* 
department's Curriculum-Referenced Test in Reading/ Language Arts 
and with his or her percentile rank for Reading on the MAT6. The 
data show that the Reading/ Language Arts CRT more accurately 
captures a student's academic capability — as evidenced by 
his or her second grade achievement — than does the Metropolitan 
Achievement Test. This is because the CRT's are based directly 
on curriculum guides that drive the instructional content for the 
Boston Publ ic Schools . Items on the Metropol itan Achievement 
Test, however, are drawn from a variety of curriculum sources 
nationally and span several grade levels,*^ which the CRT's do 
not do. Also of importance is that the CRT's are administered 
with more flexible time limits than are the Metropolitan Achieve- 
ment Tests. 

The data here show that when an instrument both articulates 
closely with a school system's curriculum and assesses student 
knowledge appropriate for a given grade level, that instrument 
provides a more realistic account of a pupil's actual and poten- 
tial school performance than does a norm-referenced measurement 
tool covering material beyond a pupil's assigned grade. This 
finding raises questions about the 'curricular validity' of the 
Metropolitan Achievement Test for Boston Pxiblic Schools' stu- 
dents. More specifically, if an important goal of the MAT6 iz to 
use objective criteria to identify students who may be in danger 
of failing academically, then ic is doubtful whether an instru- 
ment that measures skills taught beyond a child's actual grade 
level can accomplish this objective with a high degree of accu- 
racy. 



^ Items on the Metropolitan Achievement Test for Grade 1 test 
knowledge appropriate for Grades 1, 2, and 3. 



ERIC 



-13- 



18 



TABLE 3 



INDIVIDUAL SUBJECT ANALYSIS 
READING LANGUAGE ARTS 

RATING SCALE: 8(A), 7(A), 6(B + ), 5(B), 4(B-), 3(C + ), 2(C), l(C-) 



ID OSAMMAll/COMP^ CRTt % COSSICT^ 

2JKD QSADI 18T QBADI 



NJkTC RIAOIHa^ 
MAT %IX.I 









21ID ORADB 


1 8 


(A) 


93 


37 


2 7 


(A-) 


80 


37 


3 7 


(A-) 


80 


27 


4 3 


(C+) 


100 


20 


5 7 


(A-) 


73 


23 


6 5 


(B) 


80 


20 


7 5 


(B) 


87 


34 


8 5 


(B) 


93 


5 


6(B+) 


86 


25 




'EAN 


MEAN 


MEDIA 


RATING 


RATING 


%ILE 




COMPOfllTIOll 


aCORIl TEACHER 


RATINGS FOR 



VOCABULARY, PHONICS, SENTENCE-COMPLETION, LITERAL 
COMPREHENSION, AND EVALUATION COMPREHENSION SKILLS. 

'CRT RIADIMO fCORIl PERCENT OF ITEMS SCORED CORRECTLY ON 
THE BOSTON PUBLIC SCHOOLS' CURRICULUM REFERENCED 
READING/LANGUAGE ARTS TEST. THE CRT CONTAINS ITEMS 
WHICH EVALUATE A STUDENT'S GRAMMAR AND COMPOSITION 
SKILLS. 

^MATC PlSCHITILl RAMKBl BASED ON A COMPOSITE SCOR^! FOR THE 
VOCABULARY, WORD RECOGNITION, AND READING 
COMPREHENSION SUBTESTS. 



WRITING SAMPLE EVALUATIONS 



Earlier w« referred to the connection between reading and writing 
and argued that both procesnes require similar cognitive oper- 
ations. These operations consist of the organization of meaning 
through language to define, develop, classify, and conjoin Ideas. 
Relating experience and knowledge to a text Is the foundation for 
both reading and writing, which are active processes of composing 
and comprehending. More precisely stated: 



ComposirK and compnhending are process oriented 
thinking skills which are basically interrelated. 
Composing...actively engams the learner in con- 
structing meaning, m demt^ring ideas, in relating 
ideas, and in expressing ideas. Comprehend- 
ing...requires the learner to rKonstruct the 
structure and meaning of ideas expressed by -mother 
writer. 

(Tlemey and Pearson, 1983, p. 582) 



Because of this close association between reading and writing, we 
decided to evaluate journal material follow-up students had 
produced Immediately prior to the assessment of their read- 
ing/language arts skills on the MAT6. He determined that a 
scrutiny of pupil writing samples would provide an additional 
measure of tholr reading/ language arts competence at ':he time of 
testing. Results of writing evaluations by two Independent 
raters are presented In Table 4. 



-15- 

20 



TABLE 4 

STUDENT JOURNAL ANALYSIS 
AVERAGE SKILL-AREA RATINGS 

RATING SCALE: 8(A), 7(A), 6(B+), 5(B), 4(8-), 3(C+),2(Q, 1(C.) 
SiCILL AREA GROUP MEAN SCORE RANGE 



OVERALLWRTTING ABILITY 6 (B+) 3 TO 7 

(C+) (A-) 

SPELUNG AND PUNCTUATION 6 (B+) 3 TO 7 

(C+) (A-) 

GRAMMATICAL STRUCTURE 6 (B+) 2 TO 7 

(C) (A-) 

GRAMMATICAL USAGE 6 (B+) 3 TO 8 

(C+) (A) 

GRAMMATICAL COMPLEXITY 6 (B+) 3 TO 7 

(C+) (A-) 

DESCRIPTIVE MODIFICATION 6 (B+) 3 TO 8 

(C+) (A) 

THEMATIC COHESION 6 (B+) 3 TO 7 

(C+) (A) 

IDEA DEVELOPMENT 6 (B+) 3 TO 8 

(C+) (A) 



As expected, mean ratings of the writing samples for pupils in 
question showed better-than average scores for all the categories 
specified on the evaluation instrument. Not only did the chil- 
dren's texts reflect their ability to use age-appropriate gram- 
matical features, but the texts also revealed that the children 
had skill in the use of descriptive terms, in relating sentences 
to one another, and in elaborating on a topic. Most student 
transcripts received scores of B-t- or A-, with one transcript in 
each category receiving a rating of C or C+. 



16- 



COMPARISONS OF FOLLOW-UP AND WRITING-SAMPLE DATA 



In TaUcS w« further specify judgements of student writing abil- 
ity. We begin by comparing first grade teacher ratings of a 
child's grammatical writing ability with second-grade teacher 
ratings for the same ability area. Grammar scores are composite 
ratings, based upon individual item ratings. The category Grawr 
mar refers to the use of age-appropriate spelling and punctua- 
tion, word sequences, word endings, word choice, pronoun refer- 
ence, and sentence types. We also examined a pupil's stylistic 
witing ability. For this area, we asked first- and second- 
grade teachers to judge a child on his or her skill in using 
desscriptive modification, developing ideas, and conjoining 
related ideas. 

Cross-comparisons of teacher judgements for both the Grammar and 
Stylistics categories show some similarity between contrasted 
ratings. Evaluations of first-grade student journals are 
slightly higher than second-grade teacher evaluations of overall 
student grammatical and writing ability. It is important to 
note, however, that whereas writing sample evaluations are based 
upon a specific product, teacher evaluations are not. Score 
discrepancies, then, may be attributable in part to specific 
versus general analyses of student performance. Nevertheless, 
mean ratings for the total group of observed students show a high 
degree of consistency across comparison categories. 

An additional analysis contrasted outcome scores for follow-up 
students on the school-department's Curriculum-Referenced Test in 
Writing with ratings for student journal entries. The CRT writ- 
ing test elicits three narratives, each developed around a dif- 
ferent picture stimulus. These narratives are scored by the 
classroom teacht^r against criteria measuring the creativity of a 
pupil's text, its spelling, capitalization and punctuation, idea 
development, topic unity, and general appearance. The purpose of 
comparing CRT writing scores with journal scores was to contrast 
the judgements made by study pupils' first grade teacher with 
judgements of their writing ability made by other first-grade 
teachers . 

Data show moderately high levels of agreement between teacher 
ratings of pupil writing competence. We have argued that writ- 
ing, like reading, involves establishing relationships among 
words, sentences, paragraphs, and texts, and that good writing 
skills are closely associated with good reading skills. Teacher 
information regarding the progress of observed students in their 
second-grade basal reading program provides support for this 
contention. On a rating scale ranging from 'poor,' to 'excel- 
lent,' the progress of two students was judged to be 'good,' that 
of five students, 'very good,' and that of one student, 'excel- 
lent. ' Furthermore, all the students were considered to be read- 



id 

ERIC 



-17- 

22 



ing at or abov« grad« l«v«l, and sob* were even at the top of 
their class. This finding lend« further credence to the arqximent 
that standardized achievement tests nay not be the most approori- 
ate measures of actual student knowledge. 



TABLES 

INDIVIDUAL SUBJECT ANALYSIS 
WRITING 

RATING SCALE: 8(A), 7(A), 6(B + ), 5(B), 4(B-), 3(C + ), 2(C), l(C-) 



10 OMUOIAS^ 

JOUUOa MTZVO 


am am tan 


STTLZSTZCS^ 
JOmUDOi XATZVO 


8TYLZSTZCS 

am OK Tcns 


CRT^ 
8C0RB 




6 


(B+) 


8 


(A) 


7 


(A-) 


8 


(A) 


93 




6 


(B+) 


6 


(B+) 


8 


(B+) 


8 


(A) 


90 




7 


(A-) 


6 


(B+) 


7 


(A) 


5 


(B) 


93 




7 


(A-) 


4 


(B-) 


8 


(A) 


4 


(C) 


93 




4 


(B-) 


5 


(B) 


4 


(B-) 


5 


(B) 


90 




6 


(B+) 


4 


(B-) 


7 


(A-) 


4 


(B-) 


93 




6 


(B+) 


5 


(B) 


6 


(B+) 


5 


(B) 


97 


8 


3 


(C+) 


5 


(B) 


3 


(C+) 


5 


(B) 


87 


MEAN 
RATING 




5(B) 


6(B-i-) 


5(B) 


92 



Note: Ratings for fint-grade student journals are based upon texts written in April of 1988. 
The scores reported above are an average of the composite score of each of two 
independent raters. 

'asMoaji ukTZMf or stodhit JouniiALSt obtained from items judging a 

STUDBMT'S writing ABILITY IN THE AREAS OF SPELLING AND 
FUMCTOATION, GRAMMATICAL STRUCTURE, GRAMMATIICAL USAGE, AND 
GRAMNATICAL COMPLEXITY. 

'STYLISVZC RATZVM OF •TUDIIIT JOORMALSt OBTAINED FROM ITEMS JUDGING A 
STUDENT'S WRITING ABILITY IN THE AREAS OF DESCRIPTIVE 
MODIFICATION, THEMATIC COHESION, AND IDEA DEVELOPMENT. 

^CRT WRZTim SCORISt ASSIGNED TO STUDENTS BY THEIR FIRST-GRADE TEACHER. 
THE SCORES MEASURE WRITING SAMPLES ON THE BASIS OF THEIR 
CREATIVITY, PARAGRAPH UNITY, SPELLING, CAPITALIZATION AND 
PUNCTUATION, AND GENERAL APPEARANCE. 



ERIC 



•18- 

23 



CONCLUSION 



In this pilot study ve have looked at several measures of student 
achievement in the area of reading/language arts. We have shown 
that a standardized achievement test administered to eight first* 
grade pupils enrolled in a developmental education learning pro- 
gram did not accurately reflect either their first-grade reading/ 
language arts skills level or their second-grade reading/ language 
arts school performance. We have also shown that other indica- 
tors of pupil academic attainment such as teacher evaluations of 
student products and internally-developed school department 
criterion-referenced tests more accurately mirrored student 
academic capability and potential. 

Our discussion has also centered on the interpretation of stan- 
dardized achievement test results. We have explained that in 
the Boston Public Schools, end-of-year outcomes on the Metropoli- 
tan Achievement Test (MAT6) are used to classify students academ- 
ically. A student scoring at or below the 40th percentile in a 
given academic area is formally labeled as being at risk of failing 
that area during the next school year. Whereas poor performance 
on the MAT6 may point to severe academic shortcomings for some 
students, this certainly was not the case for the eight at-risk 
students on whom we did follow-up work. Even though these young- 
sters had Metropolitan Achievement Test scores in Reading/ 
Language Arts that placed them in the danger zone according to 
school -department standards, their literacy skills were at or 
above grade level according to teacher standards. 

Finally, our pilot study raises questions about the articulation 
between test format and instructional format and test content and 
instructional content. Our findings suggest that a mismatch 
occurs when children acquiring skills through a concrete, hands- 
on, and exploratory approach to learning are evaluated on a test 
that taps knowledge through an abstract, decontextualized format. 
We have noted that young children are active learners who must 
construct and manipulate their environment in order to develop 
good conceptual skills, and that rote learning for the improve- 
ment of test performance does not promote strong critical- 
reasoning ability. Additionally, we have proposed that tests 
which assess skills across several grade levels may not accur- 
ately uncover a student's actual grade-level knowledge; conse- 
quently, these instruments may not have the curricular validity 
they are intended to have. 

Relative to policy setting for the Boston P\iblic Schools, the 
implications of our findings are both programmatic and fiscal. 
When the results of a single test are used to label children 
academically, youngsters may risk being assigned to a Chapter 1 
remedial program primarily on the basis of a single criterion — 
a reading or math score on the MAT6 that falls at or below the 



-19- 

24 



40th percentile. 8 in the case of the eight children here, four 
have been receiving Chapter i remediation services in reading, 
but three of the four are among the top students in their class. 
At a yearly cost to the school department of almost $1300 for 
each child, this is may not be the wisest allocation of monetary 
resources . 

Relative to the psychological well-being of pupils, those indi- 
viduals identified as being at risk in a given subject area on 
the Metropolitan Achievement Test may run an even greater risJc 
during the course of their instructional program. This risk is 
one of being labeled and treated as a low achiever. When teach- 
ers see an extremely low score on a standardized achievement 
test, they may be prone to make assumptions about a child that do 
not necessarily characterize his or her talents. For this rea- 
son, teachers must continually be apprised of the dangers of 
judging student abilities on the basis of a single measure of 
academic progress. 

In fields such as psychology, linguistics, id anthropology, 
information from case studies on a few subjects have helped set 
the stage for research and analyses of large groups of children. 
Therefore, given the empirical data we have presented here, we 
encourage school systems that use test scores to classify chil- 
dren to conduct follow-up research on youngsters at the early 
childhood level, especially those youngsters identified as being 
at educational risk. In this way, research findings can inform 
school personnel and policy makers of the degree of fit between 
an assessment tool and student success in the program that tool 
is measuring. We also encourage school systems that use test 
scores to classify children to think about the limitations on the 
information we get from a child's test score, and with these 
limitations in mind, to consider relying on a variety of child 
performance indicators when making major educational policy 
decisions. 



*» We wish to qualify this statement by indicating that all 
children who are eligible for Chapter i services do not neces- 
sarily receive them. Placement in the Chapter l program depends 
primarily on how low the child's percentile rank is on the MAT6 
as compared to other children in his/her school. Schools having 
a great many students with scores falling well below the 40th 
percentile rank will have a lower cut-off point for chapter 1 
placement than will schools having fewer students in the lowest 
percentile ranks. 



-20- 



REFERENCES 



Bro%m, J.S., A. Collins & p. Duguid. (1989). situated coani-tion 

and the culture of learning . 10,(1), 32 - 42. 
Cohen, S.A. (1988). Testa: M arked for life . New York: Scholas- 

tic, Inc. 

Holdaway, D. (1979). Foundations of literacy. Portsmouth, H.H. : 
Heinemann Educational Books, Inc. 

Kucer, S.B. (1987). The cognitive base of reading and writing. In 
J.R. Squire (Ed.). The dynamics of l anemaae learning (pp. 27- 
51). Urbana, 111: ERIC Clearinghouse on Reading and Communi- 
cation Skills. 

Madaus, G.F. (1983, November). Teat scor es: What do thev reti'lv 
mean in educatio nal nolicv . Paper presented at the 1983 
Convention of the New Jersey Education Association, Atlantic 
City, N.J. 

Madaus, G.F. (1988). The influence of testing on the curriculum. 
In L.N. Tanner (Ed.). Critical issues in c urriculum: Eighty- 
seventh vearbook of the natio nal aocletv for the study of 
education (pp. 83 - 121) . Chicago: University of Chicago 
Press . 

Meiselii, S.J. (1986). Testing four- and five-year olds. Educa- 
^■lonal Leadership. 90 - 92. 

Mediniu, N. & D. Monty Noill. (1988, June) Fallout from the test- 
ing explosion: How 100 million standardiz ed exams undermine 
eouitv and excellence in Ame rica's public aehoola . Cambridge, 
MA: National Center For Fair and Open Testing. 

NAEYC position statement on developmental ly appropriate practice 
in the primary grades, serving 5- through 8-year-olds. In S. 
Bredekamp (Ed.) Developmental! v ap propriate practice in earlv 
childhood programs serving childre n from birth through age a . 
(pp. 64 - 84). Washington, D.C.: National Association for 
the Education of Young Children. 

NAEYC position statement on standardized testing of young chil- 
dren 3 through 8 years of age. (1988, March). Young Child- 
ren . 42 - 47. 

Piaget, J. (1954). The eonatruction of reality i n the child . New 
York: "^'-ic Books, piaget, J. Piaget's theory. (1970). In 
P.H. Nusbsn (Ed.), Camichael's manual of child psychology:- 
Vol. 1 (pp. 703-732). Mew York: Wiley. 

Piaget, J. & Inhelder, B. (1969). The psychology of the child . 
New York: Basic Books. 

Tierney, R.J. & p.D. Pearson. (1983). Toward a composing model of 
reading. Language Arts; Rea ding and writinty . ^(5), 568 - 
580. 

Wittrock, M.C. (1983). Writing and the Teac hing of Reading . 
Language Arts: Reading and Writing, ^(5), 600 - 606. 



-21- 

26 



APPENDIX A 
INTER-RATER RELIABILITY COEFTICIENTS 



-22 



PERCENTAGE OF INTER-RATER AGREEMENT 









ITEMl 




75% 


ITEM 2 




50% 


ITEAfS 




75% 


ITEM 4 




100% 


ITEMS 




75% 


ITEM 6 




75% 


ITEM? 




100% 


ITEMS 




75% 



-23- 

^8 



APPENDIX B 
STUDENT FOLLOW-UP SURVEY 



O . -24- 

ERIC og 



STUDENT FOLLOW-UP PERFORMANCE EVALUATION 
EARLY LEARNING CENTER • DISTRICT A 
TELEPHONE INTERVIEW PROTOCOL 

(Name of Teacher), I'm going to ask you about (Student's Name) ' a school 
performance In relation to the performance of other students In 
his/her class. I'm particularly Interested In knowing how 
(Student's Name) 'B skills compare to those of his/her peers in two 
broad content areas: Reading/ T^nqy a ^a Arts and Writing , 

Using a scale of A, B, C, D, where A indicates 'extremely high performance,' 
B, 'a\^rage performance,' C , 'below average performance,' and D, ^extremely low 
performance, I'd like you to judge the level of (Student Name) ' s 
abilities against those of his/her classmates. You may use 
pluses and minuses with each letter you assign to a particular 
ability area. Let's begin with Reading/Language Arts. How would 
you rate (Student Name)' s: 

1. Vocabulaiy Skills 

Knowledge of word meanings and recognition of words in 
context. 

A B C D 



2. Phonics Skills 

Identifying beginning, medial, and final sounds. 

A B C D 



3« Senttnee-Completion Skills. 

Choosing the correct word, from a list of words, that will 

complete the meaning for a sequence of text. 

A B C D 



4. Literal Comprehension Skills 

Identifying the main idea and details of a story. 

A BCD 



5 • Evaluative Comprehension Skills 

Predicting story outcomes and logical conclusions; identi- 
fying components of a story such as the 'saddest,' 'happi- 
est,' or 'scariest' part. 

A B C D 



-25- 

30 



In the arma of WRITING, how would you rate (Student Name)' s: 



6. OvendlgninmaUcalwritik.^ ability 

Includas aae-approprlate f eatures for: (1) spelling and 
punctuation; (2) grammatical structurt (appropriate word 
sequences, the use of correct person and tense endings on 
verbs, the use of appropriate plural markers on nouns) ; 
(3) grammatical usage (appropriate word choice and r jr. 
reference); and (4) grammatical complexity (the use ^f 
different sentence types — simple, compound, complex) • 

A B C D 



7 . Overall descriptive writing ability 

Includes: (1) descriptive modiflcation (appropriate use of 
adjectives and adverbial modifiers); (2) thematic coliesion 
cohesion (sentences are related to each other In a meaningful 
way); and (3) idea development (the ability to elaborate on a 
topic) • 

A B C D 



Now I have Just a few more general fdlow-up questions. 

8. How Is (Student) progressing In his/her basal reading program? 
Would you say his/her progress Is: 

A. excellent B. very good C. good D. fair E. poor 



Name of basal reading series 

Name of book student Is using In series 
Level of book In series 



9. For approximately how many hours a week does (Student Name) 
engage In self -directed reading with texts other than a 
basal? 



-26 

31 



10 « For approximately how nany hours a week does (Student Name) 
engage in self -directed process writing? 



11. In terms of a continuum of teaching styles where a develop- 
mental teaching style is characterized by hands-on, discov- 
ery-oriented, student-directed, and process-oriented learning 
and a traditional teaching style is characterized by paper and 
pencil, workbook, teacher-directed, and product-oriented 
learning, how would you characterize your Instructional 
style? Would It be: 

A. completely developmental 

B. primarily developmental with some traditional 

features 

C. a balanced blend of developmental and tradi- 

tional features 

D. primarily traditional with some developmental 

features 

E. completely traditional 



ERLC 



-27- 



32 



APPENDIX C 
TRANSCRIPTION CONVENTIONS 



-28- 

Er|c 33 



8TUDB1IT TIUUI8CRZFT8 
KBT TO TUUISCRZPTZOV COMVBMTZONS 

011X88X011 or 70NCTUATX0M KARX8, CAPITAL LSTTBR8, OR 
A ZMCORRBCT POltCTUATXOV, e.g., ' for 

I like chicks / they are beautiful. 

O 0NZ88Z0N or A PBR80M/TBN3B NARXBR, AM AUZZLIARY VERB, OR 
A PLURAL MARKER OM A MOOM 

Yesterday I go to the movies; I been there many times; 
There are two chick In the Incubator. 

X 0NZ88Z0M or A WORD OR 8EQUEMCE Or W0RD8 

We planted and the seeds were going to grow into flow- 
ers. 

^'^'VwAA UKCOMVEMTZOMAL BPELLZMQ OR RUM-OM W0RD8 
I played a horsegame. 

O ZMCORRECT WORD U8AGB, e.g., 'a' for 'an,' 'the' for 'a,' 

Incorrect pronoun reference 

Once upon a time there was the dog named Sam. 

It's a nice horse, and I think he's pretty. 

0VERU8B or CAPITALB/PUMCTUATIOM 

The book is on The table near the Chair. 




RBVBR8B WORD ORDER 



He helped me it do. 

) UHCLBAR/6ARBLED TEXT: cannot decipher meaning 

^ MEAMIMG 0L088: interprets or paraphrases student's 

intended meaning. 



29 



APPENDIX D 
SAMPLES OF TRANSCRIBED TEXTS 




Wnft/ij alxiof^ chirk 



V03 



Frte choicp mi fin 



1 



Free choice wn'fiM, 



Text 



page 'W 



m The £jj Jhere »'s ••• o liitk 
chick^ rht yolk is ycl/o*. o^^^^, 

— Fontorftwl 



summer w€ gof Somf Jc« cg^ 05., 
we jot 90 . ■'f ^''/s 9 "S^ jy^^"" 



wenT to The jfe, f! go*" « "f"' 



doogcx cp-te- ^ '''' ^'^^ 



care if if A/^^ -fhe. ^'^^K^j^ 

joi« me o note^ 

36 



20 



23 



Context 



5 c hoof rovkoitt » 



Fret Chtite. ffn'hnj. 



My mom Scd She would by 

to piny a qamf . T am 
to hook it 6n my TV 



Ennd of Sprinq • I am 



going 
going 



We had a cook out <>ut^ide. 

in l^he piqy yard • We had 



Page 4f 



a hot dog And Chips And jys3(}LSt* 

<?icc Cream) 



Ahd Nye 



At t»>e park I like to pky 

3ocr ball outside. I am the 

qollee. X catch the ball all 
the time • 



I? 



Zl 



I 

CI 

I 



22- 



39 



hovff ^ , tVcent' 
Sa'encc tess^^. 



a comet is made of rock and 

A 

me.fal qnd fee- ®) ^sfcrmd I's made 
out of rod: and fire- ^ r/>»j of 



w/te and O ^^^^o ff^ 




IS 

Witt rocks 



Page 



13 



I 
I 



<|S <| chick" 



When I was 

/)orsEs ^otinj 
I 5aw 



Some 



d cliick I sav 5omR 
ho/ in The form. Ar^A 



leeses and ducks or.d 



hens. 



'''$pl> ^)"^ Sir 
hatck out from l»^e egg. 



unf 1 1 




0 



ERIC 



17 



41 



APPENDIX E 
WRITING SAMPLE EVALUATION INSTRUMENT 




Fv/il^f^ff fi Instrument 



STUDENT WRITING SAMPLES 

Transcript # 

The rating categories below refer to aspects of a first-grade stu- 
dent's writing skill that detemlne his or her ability to compose 
clear and effective expository prose. You have seen that spe- 
cific transcription syaibols were used to flag non-conventional 
features of a student's wltlng. When rating each transcript, 
these 'non convent lonallsns' should not be considered as errors 
but rather as deviations fro> an adult standard. The adult 
standard used here Is one that Is ap-proprlate to schov.1 settings 
and does not refer to standards appropriate to other dialects of 
English. As you know, many of the language deviations you have 
read In the child texts are normal for a f ii.st-grader; therefore, 
from a developmental or maturatlonal point of view, they cannot 
be treated as 'mistakes.' Bearing this point In mind, please 
rate each transcript according to what you regard as 'extremely 
high,' 'average,' 'below average,' and 'extremely low' writing 
performance for a typical first-grade student. 



RATING SCALE 
PLEASE CIRCLE THE LETTER OF YOUR RESPONSE 
To qualify your judgfments, you may add pluses and minuses to 
your letter chmceye.g.,B'¥, C-,etc 

A 'extremely high* 

B 'average* 

CZi 'below average' 

D_ 'extremely low' 



1. OVERALL WRITING ABILITY 



A B C D 



2 . SPELUNGAND PUNCTUATION 

Tha use of age-appropriate spelling, word segmentation, and punc- 
tuation. 

A B C D 



43 

-35 



RATING SCALE 
PLEASE CIRCLE THE LETTER OF YOUR RESPONSE 
To qualify yotdr judgements, you may add pluses and minuses to your 
letter choice, e.g., B+,c+, etc 

A 'extremely high' 

B 'average' 

C 'below average' 

D 'extremely low 



GRAMMATICAL STRUCTURE 
Appropriate word sequences; the use of correct person and 
tense endings on verbs; the use of appropriate plural markers 
on nouns, e.g.. He a real nice person; I planted a seed, and 
it grow into a flower. I have two best friend . 

A_ B_ C_ D_ 

GRAMMATICAL USAGE 
Age-appropriate word choice and pronoun reference, e.g. Once 
upon a time there was dog named Sam; it's a nice horse, 
and I think he's pretty; 

A B C D 



GRAMMATICAL COMPLEXITY 
Age-appropriate use of different sentence types, e.g., simple: We 
couldn't play outside. It was raining; compound: It was raining 
and we couldn't play outside; complex: sentences with subordina- 
tors such as Jihsn, bfifiaufig, ifbfi, Khish^ tnatf where . i£, sq, etc. 
We couldn't play outside beeauae it was raining; I know where I'm 
going to plant my garden; She's the witch Kb&'s good. 



A B C D 



DESCRIPTIVE MODIFICATION 

Age-appropriate use of descriptive adjectives and adverbial modi- 
fiers, e.g.. Yesterday my mother bought me bright red pants and 
a sliioy shirt; as compared to: My father bought me pants and 

a shirt. 



A B C D 



44 

-36- 



7. THEMATIC COHESION 

Sentances are related to each other In a meaningful way. 



A B C D 



8. IDEA DEVELOPMENT 

The student's ability to elaborate on a topic. 



A B C D 



-37- 



